Skip to content

feat(data-retention): granular PII redaction stages (input + block outputs)#5272

Open
TheodoreSpeaks wants to merge 16 commits into
stagingfrom
feat/pii-granular-redaction
Open

feat(data-retention): granular PII redaction stages (input + block outputs)#5272
TheodoreSpeaks wants to merge 16 commits into
stagingfrom
feat/pii-granular-redaction

Conversation

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator

Summary

  • Add two execution-altering PII redaction stages alongside the existing log redaction: redact the workflow input before execution, and mask every block output in-flight before the next block reads it
  • Per-stage policy (entity types + language) for each of Logs / Workflow input / Block outputs; resolved most-specific-wins per workspace, with full back-compat for existing logs-only rules
  • In-flight stages fail-fast (abort the run) on a Presidio error instead of scrubbing or leaking; the logs stage keeps scrub-to-marker
  • Reuse the shared HTTP → Presidio path; block-output redaction runs before payload compaction so offloaded large values are still masked
  • Settings UI: chip-tabs across the three stages, language-first picker with the entity grid filtered to that language's recognizers, and a confirmation before removing a workspace override

Type of Change

  • New feature

Testing

Tested manually. Unit tests for resolver back-compat, redactObjectStrings + failure modes, and the contract schema. bun run lint, check:api-validation:strict, and check:migrations origin/staging all pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Jun 30, 2026 7:46pm

Request Review

@cursor

cursor Bot commented Jun 29, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Changes execution-time data (inputs, block outputs, streams, memory) and log persistence with fail-fast vs scrub semantics; misconfiguration or Presidio outages can abort runs or affect workflow correctness, not only observability.

Overview
Adds three independently configurable PII redaction stages (workflow input, block outputs, logs), each with its own entity types and language, while legacy flat rules still map to logs-only.

Runtime: Workflow input is masked before execution when the input stage is on. Block outputs are masked in-flight (before compaction and downstream blocks), including buffer-only streaming so raw chunks are not forwarded when that stage is enabled; policy propagates to child workflows and agent memory writes. Input/block stages abort the run on Presidio failure (onFailure: 'throw'); log persistence keeps scrub-to-marker behavior. Log redaction now uses only the logs stage, applies without the pii-redaction feature flag (stored rules are the source of truth), and hydrates large-value refs before masking so offloaded content gets the logs policy.

Presidio / batching: New /analyze_batch and /anonymize_batch endpoints; masking paths chunk by shared byte/count budgets and use batched analyze/anonymize instead of per-string concurrency.

Settings & contracts: Data retention UI uses stage tabs, language-filtered entity grids, and confirm-on-remove for overrides; API/schema accept stages with validation (enabled stages must pick at least one entity type).

Reviewed by Cursor Bugbot for commit f0c71cc. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/executor/execution/block-executor.ts
@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds stage-based PII redaction for workflow execution and settings. The main changes are:

  • Separate policies for workflow input, block outputs, and logs.
  • Execution-time masking for workflow inputs and block outputs.
  • Batched Presidio analyze/anonymize endpoints and shared masking helpers.
  • Settings UI updates for per-stage language and entity selection.
  • Contract and resolver updates for staged PII policies.

Confidence Score: 4/5

This is close, but the restore path should be fixed before merging.

  • Stored rules now appear to be resolved during execution without depending on the feature flag.
  • Empty enabled stages are rejected at the API boundary.
  • Restored large-value refs can still expose raw block-output data after redaction is enabled.

apps/sim/lib/workflows/executor/execution-core.ts

Security Review

Block-output redaction can still miss raw PII stored behind old large-value refs during resume or run-from-block restore.

Important Files Changed

Filename Overview
apps/sim/lib/workflows/executor/execution-core.ts Adds input and restored-output masking, but restored large-value refs can still bypass block-output redaction.
apps/sim/lib/billing/retention.ts Resolves staged PII policies and keeps legacy rules mapped to logs-only behavior.
apps/sim/lib/api/contracts/primitives.ts Adds staged PII contract validation and rejects enabled stages without selected entity types.
apps/sim/executor/execution/block-executor.ts Masks block outputs before compaction and avoids forwarding raw streaming chunks when block-output redaction is active.

Reviews (13): Last reviewed commit: "fix(data-retention): always apply logs p..." | Re-trigger Greptile

Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
Comment thread apps/sim/executor/execution/block-executor.ts
Comment thread apps/sim/lib/workflows/executor/execution-core.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/executor/execution/block-executor.ts Outdated
Comment thread apps/sim/lib/workflows/executor/execution-core.ts Outdated
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

…redaction

# Conflicts:
#	apps/sim/ee/data-retention/components/data-retention-settings.tsx
Comment thread apps/sim/app/api/organizations/[id]/data-retention/route.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/lib/billing/retention.ts Outdated
Comment thread apps/sim/ee/data-retention/components/data-retention-settings.tsx
Comment thread apps/sim/lib/workflows/executor/execution-core.ts
@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment on lines +678 to +682
// Limitation: this walks inline strings only — values offloaded to
// large-value storage are still refs here and are not re-masked. In the
// normal flow that is safe (a run with the stage on masks before offload);
// the gap is the narrow case of a run that offloaded a large value while
// the stage was OFF and is resumed after the stage is turned ON.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Large values bypass masking

When block-output redaction is enabled after a workflow already offloaded large block outputs, this restore path only masks inline strings in the snapshot. The offloaded payloads stay behind large-value refs. On resume or run-from-block, downstream blocks can still read the raw restored payload, and log persistence can skip the large-value scrub because block-output redaction is now enabled. This leaves raw PII reachable from prior block outputs after the stage is turned on.

abortSignal: ctx.abortSignal,
// Propagate in-flight block-output redaction into child workflows so
// nested blocks mask outputs too (recurses: each child forwards it).
piiBlockOutputRedaction: ctx.piiBlockOutputRedaction,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Child workflows skip input redaction

Medium Severity

The new workflow-input PII stage runs only in executeWorkflowCore on top-level processedInput. Nested child runs are started with a direct Executor and pass childWorkflowInput unchanged. Only the block-output policy is forwarded on the context, so when the input stage is on and block outputs are off, mapped or explicit child input can execute and produce downstream state without in-flight input masking.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8f86d77. Configure here.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6e9587a. Configure here.

Comment thread apps/sim/lib/logs/execution/logger.ts Outdated
Comment on lines +689 to +692
snapshot.state.blockStates = await redactObjectStrings(
snapshot.state.blockStates,
blockOutputOpts
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Refs stay unmasked When a paused run or run-from-block snapshot contains a large-value ref that was created before block-output redaction was enabled, this call only masks inline strings. Large-value refs are treated as opaque by redactObjectStrings, so the ref still points at the original offloaded bytes. The later warm-up step can materialize that raw value for downstream blocks, letting them read or send unredacted PII even though the block-output stage is enabled.

@TheodoreSpeaks

Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment on lines +689 to +692
snapshot.state.blockStates = await redactObjectStrings(
snapshot.state.blockStates,
blockOutputOpts
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Refs stay raw

This restore path still only masks inline strings. When a paused run or run-from-block snapshot contains a large-value ref created before block-output redaction was enabled, redactObjectStrings leaves the ref untouched. The later warm-up can materialize that original offloaded value for downstream blocks, so the resumed workflow can read raw PII even though block-output redaction is now enabled. This path needs to hydrate, mask, and re-store restored refs before downstream state can use them.

@waleedlatif1 waleedlatif1 deleted the branch staging July 1, 2026 05:43
@waleedlatif1 waleedlatif1 reopened this Jul 1, 2026
Comment on lines +689 to +693
snapshot.state.blockStates = await redactObjectStrings(
snapshot.state.blockStates,
blockOutputOpts
)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Large refs remain raw

This restore path still leaves old offloaded block outputs unmasked. It only runs redactObjectStrings over restored blockStates, and that redactor treats large-value refs as opaque, so a paused run or run-from-block snapshot created before block-output redaction was enabled can still point at raw stored bytes. When the restored state is warmed and downstream blocks read that ref, they can receive the original PII even though the block-output stage is enabled. The restore path needs to hydrate, mask, and re-store those refs before exposing the state to execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants